In our initial discussion over choosing a topic for our project, we narrowed it down to environmental-related data because we were interested in seeing possible trends over time and the vast quantity of environmental data that is available helped pique our curiosity.

Introduction

Packages Required

#This will allow us to filter through our data 
library(tidyverse)
library(dplyr)
#This will help us plot figures to showcase our findings
library(ggplot2)
#This will help us organize and display our data as necessary 
library(knitr)
library(kableExtra)
#This expands our plot uses 
library(plotly)
#Scientific Notation Disabled 
options(scipen=T)

Deaths Data

We were excited to do our report over this data because it was relatively tidy and had quite a few categorical variables and options for additional columns to graph.

Our deaths due to air pollution data set was from Kaggle. The author is Akshat Giri and was last updated 2 years ago so it’s pretty relevant. When we first loaded in the data some of the column names were lengthy so we shortened them to: country, acronym, year, total deaths, indoor deaths, outdoor deaths, and ozone deaths.

Import the deaths-due-to-air-pollution data

deaths_df_old <- data.frame(read.csv("death-rates-from-air-pollution.csv"))
glimpse(deaths_df_old)
## Rows: 6,468
## Columns: 7
## $ Entity                                          <chr> "Afghanistan", "Afghan…
## $ Code                                            <chr> "AFG", "AFG", "AFG", "…
## $ Year                                            <int> 1990, 1991, 1992, 1993…
## $ Air.pollution..total...deaths.per.100.000.      <dbl> 299.4773, 291.2780, 27…
## $ Indoor.air.pollution..deaths.per.100.000.       <dbl> 250.3629, 242.5751, 23…
## $ Outdoor.particulate.matter..deaths.per.100.000. <dbl> 46.44659, 46.03384, 44…
## $ Outdoor.ozone.pollution..deaths.per.100.000.    <dbl> 5.616442, 5.603960, 5.…

We are going to rename a few of the columns and glimpse the data

deaths_df<- deaths_df_old %>% rename(country=Entity, acronym=Code, year=Year, total_deaths=Air.pollution..total...deaths.per.100.000., indoor_deaths=Indoor.air.pollution..deaths.per.100.000., outdoor_deaths=Outdoor.particulate.matter..deaths.per.100.000., ozone_deaths=Outdoor.ozone.pollution..deaths.per.100.000.)

glimpse(deaths_df)
## Rows: 6,468
## Columns: 7
## $ country        <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanist…
## $ acronym        <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG",…
## $ year           <int> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1…
## $ total_deaths   <dbl> 299.4773, 291.2780, 278.9631, 278.7908, 287.1629, 288.0…
## $ indoor_deaths  <dbl> 250.3629, 242.5751, 232.0439, 231.6481, 238.8372, 239.9…
## $ outdoor_deaths <dbl> 46.44659, 46.03384, 44.24377, 44.44015, 45.59433, 45.36…
## $ ozone_deaths   <dbl> 5.616442, 5.603960, 5.611822, 5.655266, 5.718922, 5.739…

Data Variables

Variables that interest us here include:

The data set takes a closer look at deaths caused by the ozone itself which is considered a component of outdoor air pollution.

World Population Data

The world population data set is also from Kaggel. The author is Devakumar K. P. and was last updated 2 years ago so it is also recent. From looking at a glimpse of the data set you can see the columns are country name, year, and count which refers to the population at that time

Now, let’s take a look at the population data.

world_pop <- read.csv("population_total_long.csv")
glimpse(world_pop)
## Rows: 12,595
## Columns: 3
## $ Country.Name <chr> "Aruba", "Afghanistan", "Angola", "Albania", "Andorra", "…
## $ Year         <int> 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960, 196…
## $ Count        <int> 54211, 8996973, 5454933, 1608800, 13411, 92418, 20481779,…

To get a general idea of ‘deaths-dataframe’ we made, let’s make a plots to see what’s happening. This is a plot of indoor x outdoor deaths around the world by country.

This is a mess, and so we chose two countries from each continent (a high-population and a low-population country) to graph.


We selected a high-population and a low-population country from each continent, but we wanted a consistent variation between our selection of low and high population countries. So, we came up with a formula for calculating what the low-population country should be by multiplying the high-population country by .10. For example, when we chose the U.S (which had a population of 331002651 at the time the data was recorded), we multiplied this number by .10 to get 33100265.1 and find the country whose population most-closesly matched (in this case it was Canada with 37742154).

We purposefully left out countries whose population numbers were higher than the majority because we didn’t want those countries to skew the data (Russia, India, and China).

Country.Name Year Count
Australia 1996 18311000
Brazil 1996 164614688
Germany 1996 81914831
Nigeria 1996 110668794
Pakistan 1996 127349290
United States 1996 269394000
Country.Name Year Count
Canada 1996 29610218
Chile 1996 14587370
Sri Lanka 1996 18367288
Malawi 1996 10022789
New Zealand 1996 3732000
Serbia 1996 7617794

Continents:

Combine Data Sets

First let’s look at a table of the high and low populated countries using the world population data set.

Country.Name Year Count
Australia 1996 18311000
Brazil 1996 164614688
Germany 1996 81914831
Nigeria 1996 110668794
Pakistan 1996 127349290
United States 1996 269394000
Country.Name Year Count
Canada 1996 29610218
Chile 1996 14587370
Sri Lanka 1996 18367288
Malawi 1996 10022789
New Zealand 1996 3732000
Serbia 1996 7617794

Next, we are going to see the death count for high and low populated countries using the deaths dataframe.

country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
Australia AUS 1996 23.04465 0.3585034 22.407071 0.3249375
Australia AUS 1997 22.43025 0.3222224 21.838737 0.3141838
Australia AUS 1998 21.50529 0.2839769 20.960276 0.3048918
Australia AUS 1999 20.40911 0.2590092 19.897091 0.2953354
Australia AUS 2000 19.39822 0.2398763 18.909240 0.2899216
Australia AUS 2001 18.58572 0.2234341 18.118700 0.2836469
Australia AUS 2002 18.11849 0.2105980 17.662269 0.2859938
Australia AUS 2003 17.23830 0.1937083 16.802536 0.2816949
Australia AUS 2004 16.34770 0.1760229 15.932077 0.2785466
Australia AUS 2005 15.41337 0.1599279 15.016089 0.2757150
Australia AUS 2006 14.92239 0.1496469 14.530223 0.2819060
Australia AUS 2007 14.92140 0.1449723 14.514884 0.3042005
Australia AUS 2008 14.64683 0.1383225 14.228709 0.3254648
Australia AUS 2009 14.11563 0.1259313 13.694572 0.3431982
Australia AUS 2010 13.57171 0.1174834 13.140380 0.3647233
Australia AUS 2011 13.72763 0.1119247 13.276676 0.3956796
Australia AUS 2012 12.65973 0.1018626 12.196401 0.4192914
Australia AUS 2013 11.87449 0.0973836 11.384154 0.4530427
Australia AUS 2014 11.47268 0.0931036 10.939491 0.5037056
Australia AUS 2015 11.27679 0.0886376 10.702072 0.5544068
Australia AUS 2016 10.58644 0.0844017 9.974549 0.5955779
Australia AUS 2017 10.79595 0.0833628 10.128111 0.6592419
country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
Canada CAN 1996 22.18101 0.0946226 20.155243 2.192488
Canada CAN 1997 21.92768 0.0877542 19.908473 2.195940
Canada CAN 1998 21.65538 0.0824492 19.634839 2.205681
Canada CAN 1999 21.17703 0.0751278 19.179045 2.189426
Canada CAN 2000 20.26486 0.0681836 18.326999 2.127733
Canada CAN 2001 19.82451 0.0641108 17.938427 2.076464
Canada CAN 2002 19.52428 0.0604824 17.669133 2.047603
Canada CAN 2003 19.17033 0.0564743 17.338627 2.026864
Canada CAN 2004 18.40919 0.0513588 16.629516 1.973025
Canada CAN 2005 17.79268 0.0481667 16.030102 1.954712
Canada CAN 2006 17.14391 0.0447622 15.445519 1.888735
Canada CAN 2007 16.93196 0.0435468 15.229981 1.895259
Canada CAN 2008 16.51814 0.0407468 14.829238 1.883242
Canada CAN 2009 15.76760 0.0380831 14.118647 1.838920
Canada CAN 2010 14.88338 0.0340653 13.281852 1.786430
Canada CAN 2011 14.59934 0.0319160 13.030477 1.756998
Canada CAN 2012 13.82968 0.0307105 12.243601 1.764727
Canada CAN 2013 12.97501 0.0288027 11.410021 1.733997
Canada CAN 2014 12.61872 0.0276959 11.032571 1.746991
Canada CAN 2015 12.21793 0.0270578 10.609097 1.763895
Canada CAN 2016 11.00267 0.0251286 9.397502 1.740834
Canada CAN 2017 10.71662 0.0247705 9.110733 1.739718

Lastly, we will join the population and and deaths with its respected country. The last column displays the ‘Count’ which represents the population of that country at that time period.

country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths Count
Australia AUS 1996 23.04465 0.3585034 22.407071 0.3249375 18311000
Australia AUS 1997 22.43025 0.3222224 21.838737 0.3141838 18517000
Australia AUS 1998 21.50529 0.2839769 20.960276 0.3048918 18711000
Australia AUS 1999 20.40911 0.2590092 19.897091 0.2953354 18926000
Australia AUS 2000 19.39822 0.2398763 18.909240 0.2899216 19153000
Australia AUS 2001 18.58572 0.2234341 18.118700 0.2836469 19413000
Australia AUS 2002 18.11849 0.2105980 17.662269 0.2859938 19651400
Australia AUS 2003 17.23830 0.1937083 16.802536 0.2816949 19895400
Australia AUS 2004 16.34770 0.1760229 15.932077 0.2785466 20127400
Australia AUS 2005 15.41337 0.1599279 15.016089 0.2757150 20394800
Australia AUS 2006 14.92239 0.1496469 14.530223 0.2819060 20697900
Australia AUS 2007 14.92140 0.1449723 14.514884 0.3042005 20827600
Australia AUS 2008 14.64683 0.1383225 14.228709 0.3254648 21249200
Australia AUS 2009 14.11563 0.1259313 13.694572 0.3431982 21691700
Australia AUS 2010 13.57171 0.1174834 13.140380 0.3647233 22031750
Australia AUS 2011 13.72763 0.1119247 13.276676 0.3956796 22340024
Australia AUS 2012 12.65973 0.1018626 12.196401 0.4192914 22733465
Australia AUS 2013 11.87449 0.0973836 11.384154 0.4530427 23128129
Australia AUS 2014 11.47268 0.0931036 10.939491 0.5037056 23475686
Australia AUS 2015 11.27679 0.0886376 10.702072 0.5544068 23815995
Australia AUS 2016 10.58644 0.0844017 9.974549 0.5955779 24190907
Australia AUS 2017 10.79595 0.0833628 10.128111 0.6592419 24601860
country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths Count
Canada CAN 1996 22.18101 0.0946226 20.155243 2.192488 29610218
Canada CAN 1997 21.92768 0.0877542 19.908473 2.195940 29905948
Canada CAN 1998 21.65538 0.0824492 19.634839 2.205681 30155173
Canada CAN 1999 21.17703 0.0751278 19.179045 2.189426 30401286
Canada CAN 2000 20.26486 0.0681836 18.326999 2.127733 30685730
Canada CAN 2001 19.82451 0.0641108 17.938427 2.076464 31020902
Canada CAN 2002 19.52428 0.0604824 17.669133 2.047603 31360079
Canada CAN 2003 19.17033 0.0564743 17.338627 2.026864 31644028
Canada CAN 2004 18.40919 0.0513588 16.629516 1.973025 31940655
Canada CAN 2005 17.79268 0.0481667 16.030102 1.954712 32243753
Canada CAN 2006 17.14391 0.0447622 15.445519 1.888735 32571174
Canada CAN 2007 16.93196 0.0435468 15.229981 1.895259 32889025
Canada CAN 2008 16.51814 0.0407468 14.829238 1.883242 33247118
Canada CAN 2009 15.76760 0.0380831 14.118647 1.838920 33628895
Canada CAN 2010 14.88338 0.0340653 13.281852 1.786430 34004889
Canada CAN 2011 14.59934 0.0319160 13.030477 1.756998 34339328
Canada CAN 2012 13.82968 0.0307105 12.243601 1.764727 34714222
Canada CAN 2013 12.97501 0.0288027 11.410021 1.733997 35082954
Canada CAN 2014 12.61872 0.0276959 11.032571 1.746991 35437435
Canada CAN 2015 12.21793 0.0270578 10.609097 1.763895 35702908
Canada CAN 2016 11.00267 0.0251286 9.397502 1.740834 36109487
Canada CAN 2017 10.71662 0.0247705 9.110733 1.739718 36540268

We also looked at how the data varied by continent.

joined_all <- right_join(deaths_df, world_pop, by=c('country' = 'Country.Name', 'year' = 'Year'))
head(joined_all)
##       country acronym year total_deaths indoor_deaths outdoor_deaths
## 1 Afghanistan     AFG 1990     299.4773      250.3629       46.44659
## 2 Afghanistan     AFG 1991     291.2780      242.5751       46.03384
## 3 Afghanistan     AFG 1992     278.9631      232.0439       44.24377
## 4 Afghanistan     AFG 1993     278.7908      231.6481       44.44015
## 5 Afghanistan     AFG 1994     287.1629      238.8372       45.59433
## 6 Afghanistan     AFG 1995     288.0142      239.9066       45.36714
##   ozone_deaths    Count
## 1     5.616442 12412308
## 2     5.603960 13299017
## 3     5.611822 14485546
## 4     5.655266 15816603
## 5     5.718922 17075727
## 6     5.739174 18110657

North America

north_america <- joined_all %>% filter(country %in% c("United States", "Canada"))
head(na.omit(north_america))
##   country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1  Canada     CAN 1990     23.74844     0.1461597       21.82110     2.024766
## 2  Canada     CAN 1991     23.34036     0.1347912       21.40547     2.046623
## 3  Canada     CAN 1992     23.00947     0.1247982       21.06392     2.069720
## 4  Canada     CAN 1993     23.03293     0.1191081       21.03444     2.135114
## 5  Canada     CAN 1994     22.60288     0.1107671       20.59547     2.152504
## 6  Canada     CAN 1995     22.32566     0.1015955       20.28851     2.193303
##      Count
## 1 27691138
## 2 28037420
## 3 28371264
## 4 28684764
## 5 29000663
## 6 29302311

South America

south_america <- joined_all %>% filter(country %in% c("Brazil", "Chile"))
head(na.omit(south_america))
##   country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1  Brazil     BRA 1990     74.96820      44.08928       28.36460     3.330584
## 2  Brazil     BRA 1991     71.52505      41.12989       27.91653     3.272506
## 3  Brazil     BRA 1992     69.97594      39.07269       28.37737     3.321153
## 4  Brazil     BRA 1993     69.34644      37.34668       29.37063     3.439490
## 5  Brazil     BRA 1994     66.74580      34.60871       29.48986     3.445359
## 6  Brazil     BRA 1995     63.54859      31.67095       29.22721     3.430127
##       Count
## 1 149003223
## 2 151648011
## 3 154259380
## 4 156849078
## 5 159432716
## 6 162019896

Africa

africa <- joined_all %>% filter(country %in% c("Nigeria", "Malawi"))
head(na.omit(africa))
##   country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1  Malawi     MWI 1990     167.7156      153.3657       12.60813     3.518561
## 2  Malawi     MWI 1991     167.8769      153.3428       12.77371     3.541273
## 3  Malawi     MWI 1992     171.1963      156.2008       13.19234     3.618770
## 4  Malawi     MWI 1993     175.2565      159.9608       13.45895     3.686304
## 5  Malawi     MWI 1994     180.9753      164.9773       14.10506     3.784780
## 6  Malawi     MWI 1995     183.4036      166.9812       14.48956     3.847709
##     Count
## 1 9404500
## 2 9600355
## 3 9685973
## 4 9710331
## 5 9745690
## 6 9844415

Europe

europe <- joined_all %>% filter(country %in% c("Germany", "Serbia"))
head(na.omit(europe))
##   country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Germany     DEU 1990     41.91322      1.600590       38.11494     2.724651
## 2 Germany     DEU 1991     40.73815      1.472532       37.08854     2.694316
## 3 Germany     DEU 1992     38.94425      1.367432       35.45345     2.622836
## 4 Germany     DEU 1993     38.25349      1.275528       34.85003     2.623219
## 5 Germany     DEU 1994     36.85860      1.182584       33.58411     2.573705
## 6 Germany     DEU 1995     35.66449      1.109101       32.47285     2.557293
##      Count
## 1 79433029
## 2 80013896
## 3 80624598
## 4 81156363
## 5 81438348
## 6 81678051

Asia

asia <- joined_all %>% filter(country %in% c("Pakistan", "Sri Lanka"))
head(na.omit(asia))
##    country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Pakistan     PAK 1990     144.7155      104.4196       34.80304     10.09603
## 2 Pakistan     PAK 1991     148.0120      105.5436       36.80428     10.35961
## 3 Pakistan     PAK 1992     148.6560      105.2133       37.76577     10.35540
## 4 Pakistan     PAK 1993     149.6526      104.9854       38.95704     10.37194
## 5 Pakistan     PAK 1994     151.1992      105.3557       40.06784     10.44016
## 6 Pakistan     PAK 1995     154.9523      107.2959       41.72728     10.67907
##       Count
## 1 107647921
## 2 110778648
## 3 113911126
## 4 117086685
## 5 120362762
## 6 123776839

Oceania

oceania <- joined_all %>% filter(country %in% c("Australia", "New Zealand"))
head(na.omit(oceania))
##     country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Australia     AUS 1990     26.70503     0.6924006       25.72983    0.3285590
## 2 Australia     AUS 1991     25.91503     0.6172074       25.02097    0.3222915
## 3 Australia     AUS 1992     25.70745     0.5594191       24.86599    0.3286297
## 4 Australia     AUS 1993     24.63559     0.4920491       23.86602    0.3232958
## 5 Australia     AUS 1994     24.38185     0.4454673       23.65269    0.3300999
## 6 Australia     AUS 1995     23.10038     0.3895721       22.43122    0.3244735
##      Count
## 1 17065100
## 2 17284000
## 3 17495000
## 4 17667000
## 5 17855000
## 6 18072000

This is a closer view on the population growth over time in both the high and low populated countries that we selected.

This graph shows the population change over time. Something to note is Germany and Australia’s line seem relatively flat, but a closer look will determine there is a gradual increase that is drastically slower than the other countries.

These graphs are of the same information but we have added the percentage of air-pollution related deaths as the width of the line to demonstrate visually if deaths increased or decreased over time. It is easier to see in the high-populated countries but air-pollution related deaths do decrease over time.

Death Count

Which country has the highest average death count?

Let’s make a table depicting the high and low populated countries and their respected death count due to pollution.

country hp_average_death
Australia 17.76815
Brazil 48.42928
Germany 28.10988
Nigeria 112.30157
Pakistan 144.33463
United States 26.35827
country lp_average_death
Canada 18.18542
Chile 36.51321
Malawi 147.77167
New Zealand 15.92536
Serbia 80.66558
Sri Lanka 69.60383

We wanted to take a closer look at the death count and see which country has the highest average death count. In the tables we made you can see that Pakistan had the highest average death count at 144.33 for the high populated countries. Malawi had the highest average death count at 147.77 for the low populated countries, which is higher than Pakistan.

Let’s see how this is different from continent to continent

#Mean total deaths for each continent
deaths_north <- na.omit(north_america)  %>% 
  group_by(country) %>% 
  summarize(north_america_deaths = mean(total_deaths))


deaths_south <- na.omit(south_america)  %>% 
  group_by(country) %>% 
  summarize(south_america_deaths = mean(total_deaths))


deaths_africa <- na.omit(africa)  %>% 
  group_by(country) %>% 
  summarize(africa_deaths = mean(total_deaths))


deaths_europe <- na.omit(europe)  %>% 
  group_by(country) %>% 
  summarize(europe_deaths = mean(total_deaths))


deaths_asia <- na.omit(asia)  %>% 
  group_by(country) %>% 
  summarize(asia_deaths = mean(total_deaths))


deaths_oceania <- na.omit(oceania)  %>% 
  group_by(country) %>% 
  summarize(oceania_deaths = mean(total_deaths))


#Table to view continent deaths 
kable(deaths_north, caption = "North America Average Death Count")
North America Average Death Count
country north_america_deaths
Canada 18.18542
United States 26.35827
kable(deaths_south, caption = "South America Average Death Count")
South America Average Death Count
country south_america_deaths
Brazil 48.42928
Chile 36.51321
kable(deaths_africa, caption = "Africa Average Death Count")
Africa Average Death Count
country africa_deaths
Malawi 147.7717
Nigeria 112.3016
kable(deaths_asia, caption = "Asia Average Death Count")
Asia Average Death Count
country asia_deaths
Pakistan 144.33463
Sri Lanka 69.60383
kable(deaths_europe, caption = "Europe Average Death Count")
Europe Average Death Count
country europe_deaths
Germany 28.10988
Serbia 80.66558
kable(deaths_oceania, caption = "Oceania Average Death Count")
Oceania Average Death Count
country oceania_deaths
Australia 17.76815
New Zealand 15.92536

When we look at the average death count based on continent we can see that overall Oceania countries had the least amount of deaths. On average Australia had an average death count of roughly 17.8 and New Zealand had an average death count of 15.9. Whereas Africa countries had the most amount of deaths. On average Malawi had an average of 147.8 and Nigeria had an average of 112.3.


Here’s a graph to clearly visualize the previous table

To get a better visualization we created a bar graph of the average deaths in both the high and low populated countries. In this high-population graph you can see that Pakistan is at the highest and Australia is at the lowest. In the low-population graph you can see that Malawi is at the highest and New Zealand is at the lowest.


So we’ve looked at the deaths due to pollution, but what percentage of the population was affected?

In order to get rid of the leading zeros, and clean up the y-axis, we multiplied the ‘percent_high’ and ‘percent_low’ by 100,000 since the data was per 100,000 when calculating deaths.

Country.Name average_population
Australia 21085646
Brazil 188017856
Germany 81914553
Nigeria 146828087
Pakistan 166653684
United States 299036073
Country.Name average_population
Canada 32874340
Chile 16466330
Malawi 13442531
New Zealand 4193041
Serbia 7358242
Sri Lanka 19758408

So now that we’ve looked at the deaths due to pollution we wanted to see what percentage of the population was actually affected by this. At the top we have a table depicting the average populations in both the high and low populated countries. You can see for the high populated countries that Pakistan is in the lead with 12.1% and for the low populated countries Malawi is in the lead with 130.9%.

Pollution Types

Which type of pollution has the greatest number of deaths?

Looking between indoor, outdoor, and ozone pollution deaths, we can see which pollutant-type had the greatest death count.

country avg_indoor avg_outdoor avg_ozone
Pakistan 87.7427944 50.52063 10.440656
Nigeria 75.8755074 35.21678 2.117076
Brazil 19.4258385 26.84194 2.740342
Germany 0.7170881 25.47078 2.343892
Australia 0.2485867 17.20789 0.360452
United States 0.1656402 22.79947 3.915093

The average indoor-death count is higher than other pollutant types, especially in Pakistan and Nigeria. In all three categories, Pakistan had the highest death count.

#Low Population Pollutant Averages 
low_poll <- deaths_df %>% 
  group_by(country) %>% 
  filter(country %in% c('Canada', 'Chile', 'Malawi', 'Serbia', 'Sri Lanka', 'New Zealand')) %>% 
  select(country, indoor_deaths, outdoor_deaths, ozone_deaths) %>% 
  summarize(avg_indoor = mean(indoor_deaths), avg_outdoor = mean(outdoor_deaths), avg_ozone = mean(ozone_deaths))

kable(low_poll)
country avg_indoor avg_outdoor avg_ozone
Canada 0.0651156 16.38423 1.9697041
Chile 8.6932699 27.17442 0.8504919
Malawi 132.1891749 13.81151 3.3870514
New Zealand 0.2908622 15.56872 0.0727512
Serbia 35.8762796 42.71254 2.9395671
Sri Lanka 44.5428441 24.77233 0.4304406

In the low populated countries, Malawi has a higher average indoor death than that of any of our other selected countries. It is also the highest in all three categories, as well.


In these graphs, it is much easier to see the discrepancies in the indoor deaths of the countries. Malawi, Pakistan, and Nigeria have dots located high on the graph indicating high death counts.

# High Outdoor Air Pollution
h_outdoor <- ggplot(high_poll, aes(x=country, y = avg_outdoor, color = avg_ozone, cex=5)) +
  scale_color_gradient2(low = "light pink", mid = "pink", high = "violet", aesthetics = "colour") +
  geom_point() +
  labs(title = "Outdoor Air Pollution Deaths in High Population Countries") +
  xlab("Country") + 
  ylab("Average Deaths")
ggplotly(h_outdoor)
#Low Outdoor Air Pollution
l_outdoor <- ggplot(low_poll, aes(x = country, y= avg_outdoor, color = avg_ozone, cex=5)) +
  scale_color_gradient2(low = "light pink", mid = "pink", high = "violet", aesthetics = "colour") +
  geom_point() +
  labs(title = "Outdoor Air Pollution Deaths in Low Population Countries")+
  xlab("Country")+
  ylab("Average Deaths")
ggplotly(l_outdoor)

In these graphs, the outdoor pollution deaths are displayed on the y-axis with the amount of ozone-related deaths indicated by the color gradient. The dots highest-up on the graph have high outdoor-related deaths, and the darker the color of pink indicates a high ozone-related death.

For our high-populated countries, Pakistan had high outdoor-deaths and ozone-deaths while the other high-populated countries were lower in ozone-deaths. In the low-populated countries, Serbia had the highest outdoor and ozone deaths, but Malawi in particular had the greatest ozone deaths.


Pollution Over Time

Let’s look at the previous two decades and compare the death count

has there been a change?

To see if there’s been a change over time we looked at 2 decades. The first decade was from 1996-2006. Here we can see that there’s a general decrease in both high and low populated countries over the years even though some countries have a higher death count than others, such as Nigeria, Pakistan, Malawi, and Tonga.

This is the first decade 1996-2006
country High_Deaths_96 High_Deaths_01 High_Deaths_06
Australia 23.04465 18.58572 14.92239
Brazil 60.67757 49.46436 41.46829
Germany 34.72325 28.38756 23.83654
Nigeria 136.08978 123.05129 102.26653
Pakistan 155.42988 151.25352 146.09296
United States 29.99271 28.93114 25.93369
country Low_Deaths_96 Low_Deaths_01 Low_Deaths_06
Canada 22.18101 19.82451 17.14391
Chile 46.36829 37.43188 30.99058
Malawi 183.14179 165.41702 137.54033
Serbia 93.44700 83.18333 79.04236
Sri Lanka 85.28997 72.16239 66.04455
Tonga 100.66078 95.27073 88.65608

The second decade was from 2007-2017 and there is still a decrease in both high and low populated countries over the years. You can also see that the death counts in the second decade are lower than the death counts in the first decade. If you look at Pakistan you can see the death count started with 155.4 in 1996 and went down to 143.8 in 2007.

This is the second decade 2007-2017
country High_Deaths_07 High_Deaths_12 High_Deaths_17
Australia 14.92140 12.65973 10.79595
Brazil 40.42460 35.39069 30.32108
Germany 23.45850 20.91536 19.82826
Nigeria 98.90306 84.22324 81.22147
Pakistan 143.81724 133.93887 123.21548
United States 25.11756 21.98194 18.82515
country Low_Deaths_07 Low_Deaths_12 Low_Deaths_17
Canada 16.93196 13.82968 10.71662
Chile 30.53130 27.31475 24.29921
Malawi 132.12253 116.27470 104.93508
Serbia 76.65752 72.77354 62.57853
Sri Lanka 66.05987 59.22433 38.46264
Tonga 87.81178 79.49336 70.72940

Let’s see if there is variation by continent. Here are some tables for the first decade (1996-2006) and second decade (2007-2017) grouped by continent.

#North America 1996-2006
north_96 <- na.omit(north_america)  %>% 
  group_by(country) %>% 
  filter(year == 1996) %>% 
  summarize(avg_deaths_96 = mean(total_deaths))

north_01 <- na.omit(north_america)  %>% 
  group_by(country) %>% 
  filter(year == 2001) %>% 
  summarize(avg_deaths_01 = mean(total_deaths))

north_06 <- na.omit(north_america)  %>% 
  group_by(country) %>% 
  filter(year == 2006) %>% 
  summarize(avg_deaths_06 = mean(total_deaths))

kable(list(north_96,north_01,north_06), caption = "North America Deaths 1996-2006")
North America Deaths 1996-2006
country avg_deaths_96
Canada 22.18101
United States 29.99271
country avg_deaths_01
Canada 19.82451
United States 28.93114
country avg_deaths_06
Canada 17.14391
United States 25.93369
# North America 2007-2017
north_07 <- na.omit(north_america)  %>% 
  group_by(country) %>% 
  filter(year == 2007) %>% 
  summarize(avg_deaths_07 = mean(total_deaths))

north_12 <- na.omit(north_america)  %>% 
  group_by(country) %>% 
  filter(year == 2012) %>% 
  summarize(avg_deaths_12 = mean(total_deaths))

north_17 <- na.omit(north_america)  %>% 
  group_by(country) %>% 
  filter(year == 2017) %>% 
  summarize(avg_deaths_17 = mean(total_deaths))

kable(list(north_07,north_12,north_17), caption = "North America Deaths 2007-2017")
North America Deaths 2007-2017
country avg_deaths_07
Canada 16.93196
United States 25.11756
country avg_deaths_12
Canada 13.82968
United States 21.98194
country avg_deaths_17
Canada 10.71662
United States 18.82515
#South America 1996-2006
south_96 <- na.omit(south_america)  %>% 
  group_by(country) %>% 
  filter(year == 1996) %>% 
  summarize(avg_deaths_96 = mean(total_deaths))

south_01 <- na.omit(south_america)  %>% 
  group_by(country) %>% 
  filter(year == 2001) %>% 
  summarize(avg_deaths_01 = mean(total_deaths))

south_06 <- na.omit(south_america)  %>% 
  group_by(country) %>% 
  filter(year == 2006) %>% 
  summarize(avg_deaths_06 = mean(total_deaths))

kable(list(south_96,south_01,south_06), caption = "South America Deaths 1996-2006")
South America Deaths 1996-2006
country avg_deaths_96
Brazil 60.67757
Chile 46.36829
country avg_deaths_01
Brazil 49.46436
Chile 37.43188
country avg_deaths_06
Brazil 41.46829
Chile 30.99058
# South America 2007-2017
south_07 <- na.omit(south_america)  %>% 
  group_by(country) %>% 
  filter(year == 2007) %>% 
  summarize(avg_deaths_07 = mean(total_deaths))

south_12 <- na.omit(south_america)  %>% 
  group_by(country) %>% 
  filter(year == 2012) %>% 
  summarize(avg_deaths_12 = mean(total_deaths))

south_17 <- na.omit(south_america)  %>% 
  group_by(country) %>% 
  filter(year == 2017) %>% 
  summarize(avg_deaths_17 = mean(total_deaths))

kable(list(south_07,south_12,south_17), caption = "South America Deaths 2007-2017")
South America Deaths 2007-2017
country avg_deaths_07
Brazil 40.4246
Chile 30.5313
country avg_deaths_12
Brazil 35.39069
Chile 27.31475
country avg_deaths_17
Brazil 30.32108
Chile 24.29921
# Africa 1996-2006
africa_96 <- na.omit(africa)  %>% 
  group_by(country) %>% 
  filter(year == 1996) %>% 
  summarize(avg_deaths_96 = mean(total_deaths))

africa_01 <- na.omit(africa)  %>% 
  group_by(country) %>% 
  filter(year == 2001) %>% 
  summarize(avg_deaths_01 = mean(total_deaths))

africa_06 <- na.omit(africa)  %>% 
  group_by(country) %>% 
  filter(year == 2006) %>% 
  summarize(avg_deaths_06 = mean(total_deaths))

kable(list(africa_96,africa_01,africa_06), caption = "Africa Deaths 1996-2006")
Africa Deaths 1996-2006
country avg_deaths_96
Malawi 183.1418
Nigeria 136.0898
country avg_deaths_01
Malawi 165.4170
Nigeria 123.0513
country avg_deaths_06
Malawi 137.5403
Nigeria 102.2665
# Africa 2007-2017
africa_07 <- na.omit(africa)  %>% 
  group_by(country) %>% 
  filter(year == 2007) %>% 
  summarize(avg_deaths_07 = mean(total_deaths))

africa_12 <- na.omit(africa)  %>% 
  group_by(country) %>% 
  filter(year == 2012) %>% 
  summarize(avg_deaths_12 = mean(total_deaths))

africa_17 <- na.omit(africa)  %>% 
  group_by(country) %>% 
  filter(year == 2017) %>% 
  summarize(avg_deaths_17 = mean(total_deaths))

kable(list(africa_07,africa_12,africa_17), caption = "Africa Deaths 2007-2017")
Africa Deaths 2007-2017
country avg_deaths_07
Malawi 132.12253
Nigeria 98.90306
country avg_deaths_12
Malawi 116.27470
Nigeria 84.22324
country avg_deaths_17
Malawi 104.93508
Nigeria 81.22147
#Europe 1996-2006
europe_96 <- na.omit(europe)  %>% 
  group_by(country) %>% 
  filter(year == 1996) %>% 
  summarize(avg_deaths_96 = mean(total_deaths))

europe_01 <- na.omit(europe)  %>% 
  group_by(country) %>% 
  filter(year == 2001) %>% 
  summarize(avg_deaths_01 = mean(total_deaths))

europe_06 <- na.omit(europe)  %>% 
  group_by(country) %>% 
  filter(year == 2006) %>% 
  summarize(avg_deaths_06 = mean(total_deaths))

kable(list(europe_96,europe_01,europe_06), caption = "Europe Deaths 1996-2006")
Europe Deaths 1996-2006
country avg_deaths_96
Germany 34.72325
Serbia 93.44700
country avg_deaths_01
Germany 28.38756
Serbia 83.18333
country avg_deaths_06
Germany 23.83654
Serbia 79.04236
#Europe 2007-2017
europe_07 <- na.omit(europe)  %>% 
  group_by(country) %>% 
  filter(year == 2007) %>% 
  summarize(avg_deaths_07 = mean(total_deaths))

europe_12 <- na.omit(europe)  %>% 
  group_by(country) %>% 
  filter(year == 2012) %>% 
  summarize(avg_deaths_12 = mean(total_deaths))

europe_17 <- na.omit(europe)  %>% 
  group_by(country) %>% 
  filter(year == 2017) %>% 
  summarize(avg_deaths_17 = mean(total_deaths))

kable(list(europe_07,europe_12,europe_17), caption = "Europe Deaths 2007-2017")
Europe Deaths 2007-2017
country avg_deaths_07
Germany 23.45850
Serbia 76.65752
country avg_deaths_12
Germany 20.91536
Serbia 72.77354
country avg_deaths_17
Germany 19.82826
Serbia 62.57853
#Asia 1996-2006
asia_96 <- na.omit(asia)  %>% 
  group_by(country) %>% 
  filter(year == 1996) %>% 
  summarize(avg_deaths_96 = mean(total_deaths))

asia_01 <- na.omit(asia)  %>% 
  group_by(country) %>% 
  filter(year == 2001) %>% 
  summarize(avg_deaths_01 = mean(total_deaths))

asia_06 <- na.omit(asia)  %>% 
  group_by(country) %>% 
  filter(year == 2006) %>% 
  summarize(avg_deaths_06 = mean(total_deaths))

kable(list(asia_96,asia_01,asia_06), caption = "Asia Deaths 1996-2006")
Asia Deaths 1996-2006
country avg_deaths_96
Pakistan 155.42988
Sri Lanka 85.28997
country avg_deaths_01
Pakistan 151.25352
Sri Lanka 72.16239
country avg_deaths_06
Pakistan 146.09296
Sri Lanka 66.04455
#Asia 2007-2017
asia_07 <- na.omit(asia)  %>% 
  group_by(country) %>% 
  filter(year == 2007) %>% 
  summarize(avg_deaths_07 = mean(total_deaths))

asia_12 <- na.omit(asia)  %>% 
  group_by(country) %>% 
  filter(year == 2012) %>% 
  summarize(avg_deaths_12 = mean(total_deaths))

asia_17 <- na.omit(asia)  %>% 
  group_by(country) %>% 
  filter(year == 2017) %>% 
  summarize(avg_deaths_17 = mean(total_deaths))

kable(list(asia_07,asia_12,asia_17), caption = "Asia Deaths 2007-2017")
Asia Deaths 2007-2017
country avg_deaths_07
Pakistan 143.81724
Sri Lanka 66.05987
country avg_deaths_12
Pakistan 133.93887
Sri Lanka 59.22433
country avg_deaths_17
Pakistan 123.21548
Sri Lanka 38.46264
#Oceania 1996-2006
oceania_96 <- na.omit(oceania)  %>% 
  group_by(country) %>% 
  filter(year == 1996) %>% 
  summarize(avg_deaths_96 = mean(total_deaths))

oceania_01 <- na.omit(oceania)  %>% 
  group_by(country) %>% 
  filter(year == 2001) %>% 
  summarize(avg_deaths_01 = mean(total_deaths))

oceania_06 <- na.omit(oceania)  %>% 
  group_by(country) %>% 
  filter(year == 2006) %>% 
  summarize(avg_deaths_06 = mean(total_deaths))

kable(list(oceania_96,oceania_01,oceania_06), caption = "Oceania Deaths 1996-2006")
Oceania Deaths 1996-2006
country avg_deaths_96
Australia 23.04465
New Zealand 21.15988
country avg_deaths_01
Australia 18.58572
New Zealand 16.91014
country avg_deaths_06
Australia 14.92239
New Zealand 13.76706
#Oceania 2007-2017
oceania_07 <- na.omit(oceania)  %>% 
  group_by(country) %>% 
  filter(year == 2007) %>% 
  summarize(avg_deaths_07 = mean(total_deaths))

oceania_12 <- na.omit(oceania)  %>% 
  group_by(country) %>% 
  filter(year == 2012) %>% 
  summarize(avg_deaths_12 = mean(total_deaths))

oceania_17 <- na.omit(oceania)  %>% 
  group_by(country) %>% 
  filter(year == 2017) %>% 
  summarize(avg_deaths_17 = mean(total_deaths))

kable(list(oceania_07,oceania_12,oceania_17), caption = "Oceania Deaths 2007-2017")
Oceania Deaths 2007-2017
country avg_deaths_07
Australia 14.92140
New Zealand 13.58658
country avg_deaths_12
Australia 12.65973
New Zealand 10.91224
country avg_deaths_17
Australia 10.795952
New Zealand 8.598757

Based on these tables, we can see that Oceania countries had the lowest death counts over time across both decades. In Europe, although Germany is a high-populated country its death count was significantly less than Serbia, a low-populated country. This was similarly seen in Africa, where Malawi, a low-populated country, had a higher death count overall in comparison to Nigeria, a high-populated country. In Africa, there was the greatest amount of decrease in their average death count. Malawi decreased roughly 4,600,000 from 1996 to 2006.


Let’s graph the previous tables!

In the first graph we faceted the high-populated countries and look at the average death count in 1996 on the x-axis, the average death count in 2001 on the y-axis, and color the points by the average death count in 2006. The darker the points are the higher the death count was in 2006. To get a closer look at the average death count in all 3 years we created individualized bar graphs for 1996, 2001, and 2006.

High Population first decade (1996-2006).


In the first graph we faceted the low-populated countries and look at the average death count in 1996 on the x-axis, the average death count in 2001 on the y-axis, and color the points by the average death count in 2006. The darker the points are the higher the death count was in 2006. To get a closer look at the average death count in all 3 years we created individualized bar graphs for 1996, 2001, and 2006.

Low Population first decade (1996-2006).

#Low Population Deaths 1996-2006
p_low_first <- all_low_first %>%
  group_by(country) %>% 
  ggplot(aes(x= Low_Deaths_96, y= Low_Deaths_01, color= Low_Deaths_06)) +
  scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
  geom_point() +
  facet_wrap(~country)

interact_low_first<- p_low_first + labs(title="Low Populated Countries", x="1996 Deaths", y="2001 Deaths")

ggplotly(interact_low_first)
#Low Population Deaths 1996 
p_low_first <- all_low_first %>%
  group_by(country) %>% 
  ggplot(aes(x= country, y= Low_Deaths_96, fill = country)) +
  scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
  xlab("Country")+
  ylab("1996 Deaths")+
  geom_col(position = 'dodge')

interact_low_first<- p_low_first + labs(title="Low Populated Countries")

ggplotly(interact_low_first)
#Low Population Deaths 2001 
p_low_first <- all_low_first %>%
  group_by(country) %>% 
  ggplot(aes(x= country, y= Low_Deaths_01, fill = country)) +
  scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
  xlab("Country")+
  ylab("2001 Deaths")+
  geom_col(position = 'dodge')

interact_low_first<- p_low_first + labs(title="Low Populated Countries")

ggplotly(interact_low_first)
#Low Population Deaths 2006 
p_low_first <- all_low_first %>%
  group_by(country) %>% 
  ggplot(aes(x= country, y= Low_Deaths_06, fill = country)) +
  scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
  xlab("Country")+
  ylab("2006 Deaths")+
  geom_col(position = 'dodge')

interact_low_first<- p_low_first + labs(title="Low Populated Countries")

ggplotly(interact_low_first)

In the first graph we faceted the high-populated countries and look at the average death count in 2007 on the x-axis, the average death count in 2012 on the y-axis, and color the points by the average death count in 2017. The darker the points are the higher the death count was in 2017. To get a closer look at the average death count in all 3 years we created individualized bar graphs for 2007, 2012, and 2017.

High Population second decade (2007-2017).


In the first graph we faceted the low-populated countries and look at the average death count in 2007 on the x-axis, the average death count in 2012 on the y-axis, and color the points by the average death count in 2017. The darker the points are the higher the death count was in 2017. To get a closer look at the average death count in all 3 years we created individualized bar graphs for 2007, 2012, and 2017.

Low Population second decade (2007-2017)

#Low Population Deaths 2007-2017
p_low_second <- all_low_second %>%
  group_by(country) %>% 
  ggplot(aes(x= Low_Deaths_07, y= Low_Deaths_12, color= Low_Deaths_17)) +
  scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
  geom_point() +
  facet_wrap(~country)

interact_low_second<- p_low_second + labs(title="Low Populated Countries", x="2007 Deaths", y="2012 Deaths")

ggplotly(interact_low_second)
#Low Population Deaths 2007
p_low_second_07 <- all_low_second %>%
  group_by(country) %>% 
  ggplot(aes(x= country, y= Low_Deaths_07, fill=country)) +
  scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
  xlab("Country")+
  ylab("2007 Deaths")+
  geom_col(position='dodge')

interact_low_second_07<- p_low_second_07 + labs(title="Low Populated Countries")

ggplotly(interact_low_second_07)
#Low Population Deaths 2012
p_low_second_12 <- all_low_second %>%
  group_by(country) %>% 
  ggplot(aes(x= country, y= Low_Deaths_12, fill=country)) +
  scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
  xlab("Country")+
  ylab("2012 Deaths")+
  geom_col(position='dodge')

interact_low_second_12<- p_low_second_12 + labs(title="Low Populated Countries")

ggplotly(interact_low_second_12)
#Low Population Deaths 2017
p_low_second_17 <- all_low_second %>%
  group_by(country) %>% 
  ggplot(aes(x= country, y= Low_Deaths_17, fill=country)) +
  scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
  xlab("Country")+
  ylab("2017 Deaths")+
  geom_col(position='dodge')

interact_low_second_17<- p_low_second_17 + labs(title="Low Populated Countries")

ggplotly(interact_low_second_17)

By comparing each pollutant type, we can determine which year and country had the highest numbers of deaths

Let’s focus on Indoor Deaths first:

It is interesting to see the decrease in both the high and low-populated countries over time. It is easier to see the difference in Malawi’s indoor deaths. Pakistan and Malawi were the two countries with the highest indoor death count in our subset. Despite being a low-populated country, Malwai had much larger death counts.


Outdoor Deaths:

For outdoor deaths you can see that for high populated countries Pakistan is in the lead. There tends to be an interesting increase in Nigeria from 2014-2015. Also Germany and the United States are very neck in neck over the years. In the lower populated countries you can see that Serbia is in the lead. Sri Lanka does have a steep decline from 2015 to 2016 but before that it was really close with Tonga. When you look at the top 2 countries, Pakistan and Serbia, you can see that Pakistan is drastically higher. 2011 was the greatest number of deaths for Pakistan. 1997 was the greatest number of deaths for Serbia


Ozone Deaths

In the high-populated countries, the U.S. is the second-highest in ozone-related deaths and even increases over some years. We also see an increase in deaths in Pakistan, Chile, and Sri Lanka. Malawi is still one of the highest countries for ozone-related deaths, even though it saw the most decrease over time.

Which is worse?

outdoor or indoor pollution?

Let’s reintroduce a graph we looked at earlier. Instead this time we will combine the pollutant types together.

In the high populated countries we can see that 2 countries exhibit higher indoor air pollutant deaths whereas the others depict higher outdoor deaths. In the lower populated countries you can see that half show higher outdoor air pollutant deaths and half show higher indoor air pollutant deaths. Therefore, we cannot conclude which is worse.

Summary

There are so many factors affecting environmental data, another area of interest would be to look into the cultural, geographical, and historical climate of the country to see if any significant events occurred to skew the numbers during those years where great increases and decreases occurred.

To make the data more manageable we had to reduce the number of countries we looked at. This is why we selected one high populated country and one low populated country from each continent. We thought having a high and low populated country would give us a general idea on how air pollution affected each continent. Furthermore, we excluded Russia, China, and India because their populations were significantly large and wouldn’t be a good representation of the average populations.

Overall this project has sparked interest in environmental concerns. After seeing the impact we have in our environment’s air pollution we are prone to be more conscious of how we treat our planet. We hope that this data has brought more awareness to others as well!

Sources